Non-parametric estimation in a 
semimartingale regression model. 
Part 1. Oracle Inequalities. * 

Konev Victor ^and Pergamenshchikov Serguei* 
September 17, 2009 

Abstract 

This paper considers the problem of estimating a periodic function 
in a continuous time regression model with a general square integrable 
semimartingale noise. A model selection adaptive procedure is pro- 
posed. Sharp non-asymptotic oracle inequalities have been derived. 



Keywords: Non-asymptotic estimation; Non-parametric regression; Model 
selection; Sharp oracle inequality; Semimartingale noise. 

AMS 2000 Subject Classifications: Primary: 62G08; Secondary: 62G05 



*The paper is supported by the RFFI-Grant 09-01-00172-a. 

' Department of Applied Mathematics and Cybernetics, Tomsk State University, Lenin 
str. 36, 634050 Tomsk, Russia, e-mail: vvkonev@mail.tsu.ru 

■^Laboratoire de Mathematiques Raphael Salem, Avenue de PUniversite, BP. 12, Uni- 
versite de Rouen, F76801, Saint Etienne du Rouvray, Cedex France and Department of 
Mathematics and Mechanics, Tomsk State University, Lenin str. 36, 634041 Tomsk, Russia, 
e-mail: Serge.Pergamcnchtchikov@univ-rouen.fr 



1 



1 Introduction 

Consider a regression model in continuous time 

dy t = S(t)dt + d& , 0<t<n, (1.1) 

where 5 is an unknown 1-periodic K — > R function, 5 G £ 2 [0, n]; (£ t ) 4>0 
is a square integrable unobservable semimartingale noise such that for any 
function / from £ 2 [0,n] the stochastic integral 



w) = / m s (i.2) 

is well defined with 

EJ n (/) = and EI 2 n (f)<a* f 2 ds (1.3) 

where a* is some positive constant. 

An important example of the disturbance (£j t>0 is the following process 

& = Qi w t + Qi z t ( 1A ) 

where g 1 and g 2 are unknown constants, {g^ + \ g 2 \ > 0, (w t ) t>0 is a standard 
Brownian motion, (z t ) t>0 is a compound Poisson process defined as 

N t 

where {N t ) t>0 is a standard homogeneous Poisson process with unknown 
intensity A > and (Y ? ) J>1 is an i.i.d. sequence of random variables with 

EYj = and EY. 2 = 1 . (1.6) 

Let (T) k>1 denote the arrival times of the process (N t ) t>0 , that is, 

T fc = inf{t>0 : N t = k}. (1.7) 

As is shown in Lemma IA.2} the condition (II. 3p holds for the noise (11.41) with 

a* = g\ + g 2 2 \. 
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The problem is to estimate the unknown function S in the model f 1 1.11) 
on the basis of observations (y t )o< t < n - 

This problem enables one to solve that of functional statistics which is 
stated as follows. Let observations (x k ) 0<k<n be a segment of a sequence of 
independent identically distributed random processes x k = (x k ) 0<t<1 speci- 
fied on the interval [0, 1], which obey the stochastic differential equations 

dx k = S(t)dt + d$ , x k = x , 0<t<l, (1.8) 

where (£ fc )i<fc<„ is an i.i.d sequence of random processes £ fc = (£f) <t<i whh 
the same distribution as the process (jl.4p . The problem is to estimate the 
unknown function f(t) G C 2 [0, 1] on the basis of observations (x k ) 1<k<n . This 
model can be reduced to (11. II) . (jl.4p in the following way. Let y = (y t ) 0<t>n 
denote the process defined as : 



Vt 



x], if < t < 1 ; 



y k -i + -x , if k - 1 < t < k , 2<k<n. 

This process satisfies the stochastic differential equation 

dy t = S(t)dt + dl, 

where S(t) = S({t}) and 

e t , if 0<t<l; 

4-i + ^_ fe+1 , if k-l<t<k, 2<k<n- 



{t} — t — [t] is the fractional part of number t. 

In this paper we will consider the estimation problem for the regression 
model (ll.ip in C 2 [0, 1] with the quality of an estimate S being measured by 
the mean integrated squared error (MISE) 

K{S,S) :=E S \\S-S\\\ (1.9) 

where E s stands for the expectation with respect to the distribution P s of 
the process (11.11) given S; 

1 f 2 (t)dt. 
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It is natural to treat this problem from the standpoint of the model selec- 
tion approach. The origin of this method goes back to early seventies with the 
pioneering papers by Akaike [1] and Mallows [16] who proposed to introduce 
penalizing in a log-likelihood type criterion. The further progress has been 
made by Barron, Birge and Massart [2], [17] who developed a non-asymptotic 
model selection method which enabled one to derive non-asymptotic oracle 
inequalities for a gaussian non-parametric regression model with the i.i.d. 
disturbance. An oracle inequality yields the upper bound for the estimate 
risk via the minimal risk corresponding to a chosen family of estimates. 
Galtchouk and Pergamenshchikov [6] developed the Barron-Birge-Massart 
technique treating the problem of estimating a non-parametric drift function 
in a diffusion process from the standpoint of sequential analysis. Fourdrinier 
and Pergamenshchikov [5] extended the Barron-Birge-Massart method to the 
models with dependent observations and, in contrast to all above-mentioned 
papers on the model selection method, where the estimation procedures were 
based on the least squares estimates, they proposed to use an arbitrary family 
of projective estimates in an adaptive estimation procedure, and they discov- 
ered that one can employ the improved least square estimates to increase the 
estimation quality. Konev and Pergamenshchikov [13] applied this improved 
model selection method to the non-parametric estimation problem of a pe- 
riodic function in a model with a coloured noise in continuous time having 
unknown spectral characteristics. In all cited papers the non-asymptotic or- 
acle inequalities have been derived which enable one to establish the optimal 
convergence rate for the minimax risks. Moreover, in the latter paper the 
oracle inequalities have been found for the robust risks. 

In addition to the optimal convergence rate, an important problem is 
that of the efficiency of adaptive estimation procedures. In order to examine 
the efficiency property one has to obtain the oracle inequalities in which the 
principal term has the factor close to unity. 

The first result in this direction is most likely due to Kneip [13] who ob- 
tained, for a gaussian regression model, the oracle inequality with the factor 
close to unity at the principal term. The oracle inequalities of this type were 
obtained as well in [5] and in [1] for the inverse problems. It will be observed 
that the derivation of oracle inequalities in all these papers rests upon the 
fact that by applying the Fourier transformation one can reduce the initial 
model to the statistical gaussian model with independent observations. Such 
a transform is possible only for gaussian models with independent homoge- 
neous observations or for the inhomogeneous ones with the known correlation 
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characteristics. This restriction significantly narrows the area of application 
of such estimation procedures and rules out a broad class of models including, 
in particular, widely used in econometrics heteroscedastic regression models 
(see, for example, [12] ). For constructing adaptive procedures in the case of 
inhomogeneous observations one needs to amend the approach to the estima- 
tion problem. Galtchouk and Pergamenshchikov [TJ- [5] have developed a new 
estimation method intended for the heteroscedastic regression models. The 
heart of this method is to combine the Barron-Birge-Massart non-asymptotic 
penalization method [2] and the Pinsker weighted least square method mini- 
mizing the asymptotic risk (see, for example, [18], [IH]). Combining of these 
approaches results in the significant improvement of the estimation quality 
(see numerical example in [7]). As was shown in [8] and [PJ, the Galthouk- 
Pergamenshchikov procedure is efficient with respect to the robust minimax 
risk, i.e. the minimax risk with the additional supremum operation over the 
whole family of addmissible model distributions. In the sequel [10], [11], this 
approach has been applied to the problem of a drift estimation in a diffusion 
process. In this paper we apply this procedure to the estimation of a regres- 
sion function S in a semimartingale regression model (II. ip . The rest of the 
paper is organized as follows. In Section [2] we construct the model selection 
procedure on the basis of weighted least squares estimates and state the main 
results in the form of oracle inequalities for the quadratic risks. Section [3] 
gives the proofs of all theorems. In Appendix some technical results are 
established. 

2 Model selection 

This Section gives the construction of a model selection procedure for esti- 
mating a function S in (II. ip on the basis of weighted least square estimates 
and states the main results. 

For estimating the unknown function S in model (II. ip . we apply its 
Fourier expansion in the trigonometric basis {<pj)j>i in ^[0, 1] defined as 

0! = 1 , ^(x) = V2Tr s (2Tr\j/2]x) ,j>2, (2.1) 

where the function Tr-{x) = cos(x) for even j and Tr-{x) = sin(x) for odd 
j; [x] denotes the integer part of x. The corresponding Fourier coefficients 

O i = {S,tf> i ) = I S(t)<p 3 {t)dt (2.2) 
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can be estimated as 



n 

In view of (jl.ip . we obtain 



o 



where I n is given in ( 11. 2D . 

For any sequence x = (aJ,)i>i> we se ^ 



oo 

I 1 9 

i=i i=i 



2 = 5Z x i and #( x ) = 5Z 1 {|x 



Now we impose the additional conditions on the noise (£ t ) t > - 



C-J There exists some positive constant a > stzc/i £/ia£ £/ie sequence 

/or any n > 1, satisfies the following inequality 

c*(n) = sup < oo 

xeH,#(x)<n 



where TC — [— 1, 1]°° and 



3=1 



C 2 ) Assume that for all n> 1 



c*(n) = sup EB^x) < oo 

M<l,#(aO<n 



oo 

B 2 „ (x) = V x, 6 „ wift 6 = £ 2 - E£ 2 . 



As is stated in Theorem 12.21 Conditions C x ) and C 2 ) hold for the process 
(11.41) . Further we introduce a class of weighted least squares estimates for 
S(t) defined as 

oo 

3 7 = £70")Mi, (2-8) 

where 7 = (7(j)) J>1 is a sequence of weight coefficients such that 

< 7(j) < 1 and < #(7) < n . (2.9) 

Let T denote a finite set of weight sequences 7 = (y(j))j>i with these prop- 
erties, v = card(r) be its cardinal number and 

At = max#(7). (2.10) 

The model selection procedure for the unknown function S in (11.11) will be 
constructed on the basis of estimates (S y ) er . The choice of a specific set of 
weight sequences T will be discussed at the end of this section. In order to 
find a proper weight sequence 7 in the set T one needs to specify a cost func- 
tion. When choosing an appropriate cost function one can use the following 
argument. The empirical squared error 



Err „(7) = 115^-5 

can be written as 



2 



oc 



Err n ( 7 ) = £ 7 2 (j)^, n - 2 E 7 0"$,n <*, + E B ) ■ C 2 ' 11 ) 

i=i i=i i=i 

Since the Fourier coefficients {Qj)j>\ are unknown, the weight coefficients 
(7j)j>i can not be determined by minimizing this quantity. To circumvent 

this difficulty one needs to replace the terms 9 • n by some their estimators 

^ (2.12) 

where a n is an estimator for the quantity a in condition C x ). 

For this change in the empirical squared error, one has to pay some 
penalty. Thus, one comes to the cost function of the form 

00 00 

Ul) = E T 2 0')^,n " 2 E 70") + P^n(7) (2-13) 
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where p is some positive constant, P(j) is the penalty term defined as 

a hi 2 

P„( 7 ) = -aJ2L. (2.14) 
In the case when the value of a in C-J is known, one can put a n = a and 

P n(7) = ^- (2.15) 

Substituting the weight coefficients, minimizing the cost function, that is 

7 = argmin 7gr J n (j) , (2.16) 

in (12.81) leads to the model selection procedure 

£ = (2.17) 

It will be noted that 7 exists, since T is a finite set. If the minimizing sequence 
in (12.161) 7 is not unique, one can take any minimizer. 

Theorem 2.1. Assume that the conditions C-J and C 2 ) hold with o > 0. 
Then for anyn > 1 andO < p < 1/3, the estimator (I2.17P satisfies the oracle 
inequality 



K(S„S) < 1 + 3P - 2p2 mmTZ(S v S) + - B;(p) (2.18) 
1 — dp 7Gr ' n 



where the risk 1Z(-, S) is defined in (II. 9p . 



B* n (p) = * n (p) + 



1-3/3 

and 

2acrV + 4(TC?(n) +2vc*Jn) 

*> ) = • (2 ' 19 ' 

Now we check conditions C x ) and C 2 ) for the model (II. ip with the noise 
(11.41) to arrive at the following result. 
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Theorem 2.2. Suppose that the coefficients g 1 and q 2 in model (11. II) . (11.41) . 

are such that + A > and EF 4 < oo. Then the estimation procedure 
i ^ j 

(12.17p . for any n > 1 and < p < 1/3, satisfies the oracle inequality (12.181) 
with 

a = a* = g 2 x + XqI , c*(ra) = , 

and 

sup c*(n) < 4a* (a* + ^EYj 4 ) . 

n>l 

The proofs of Theorems 12.11 12.21 are given in Section [31 

Corollary 2.3. Let the conditions of Theorem \2.1\ hold and the quantity a 
in C-J be known. Then, for any n > 1 and < p < 1/3, £/ie estimator (12.171) 
satisfies the oracle inequality 

11{S„S) < 1 + 3P ~ 2P2 rmnn(S v S) + -^ n (p) 
1 — 3p 7 er ' n 

where ^ n {p) is given in (12.191) . 



2.1 Estimation of a 

Now we consider the case of unknown quantity o in the condition C 1 ). One 
can estimate a as 



^ = E?,n with * = [V^+1- (2-20) 

Proposition 2.4. Suppose that the conditions of Theorem \2. 1\ hold and the 
unknown function S(t) is continuously differentiable for < t < 1 such that 



IS^ = j \S{t)\dt < +oo. (2.21) 
Jo 



Then, for any n > 1, 

E s \a n -a\<^ (2.22) 



n 



where 

i 4 LSI \fa* c*(n) 
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The proof of Proposition 12.41 is given in Section [3J Theorem 12.11 and 
Proposition 12.41 imply the following result. 

Theorem 2.5. Suppose that the conditions of Theorem \2. 1\ hold and S satis- 
fies the conditions of Proposition Then, for any n > 1 and < p < l/3 ; 
the estimate ( I2.17P satisfies the oracle inequality 



K(S.,S) < 1 \ 3 \ 2p2 mmn(S v S) + -V n (p), (2.23) 
1 — Sp 7 er ' n 



where 

(1 - 3p) 

2.2 Specification of weights in the selection procedure 

Now we will specify the weight coefficients (7(j))j>i i n a wa Y proposed in [7] 
for a heteroscedastic discrete time regression model. Consider a numerical 
grid of the form 

A n = {1,...,F} x {t u ...,t m }, 

where ti = is and m = [l/£ 2 ]. We assume that both parameters k* > 1 and 
< e < 1 are functions of n, i.e. k* = k*(n) and e = s(n), such that 



k*(n) 

lim^ k*(n) = +oo , lim^ = 



^2.24) 



lim n _ ) . 00 e(n) = and lim^^ n e(n) = +oo 
for any 5 > 0. One can take, for example, 

= n— — tt and = \/ln(n + 1) 

ln(n + 1) 

for n > 1. 

For each a = £) G »4. n , we introduce the weight sequence 

7a = (7«0*)W 

given as 

7 a (i) = l{i<i<i o} + (1 " (j/^f) (2-25) 

10 



where j Q = j (a) = [ujjlnn] 



(r.tn) 1 ^ 2 ^ and r. 



P ~ 



(13 + 1) (2/3 + 1) 



We set 



r = { la ,aeA n }. 



(2.26) 



It will be noted that in this case v = k*m. 

Remark 2.1. It will be observed that the specific form of weights (12.251) was 

proposed by Pinsker /7P|/ for the filtration problem with known smoothness of 
regression function observed with an additive gaussian white noise in the con- 
tinuous time. Nussbaum fJE/ used these weights for the gaussian regression 
estimation problem in discrete time. 

The minimal mean square risk, called the Pinsker constant, is provided by 
the weight least squares estimate with the weights where the index a depends 
on the smoothness order of the function S. In this case the smoothness order 
is unknown and, instead of one estimate, one has to use a whole family of 
estimates containing in particular the optimal one. 

The problem is to study the properties of the whole class of estimates. 
Below we derive an oracle inequality for this class which yields the best mean 
square risk up to a multiplicative and additive constants provided that the the 
smoothness of the unknown function S is not available. Moreover, it will be 
shown that the multiplicative constant tends to unity and the additive one 
vanishes as n — > oo with the rate higher than any minimax rate. 

In view of the assumptions (I2.24p . for any 5 > 0, one has 



lim — F = . 



n—*oo 77/ 



Moreover, by (I2.25P for any a G A, 



n 




Therefore, taking into account that A < A\ < 1 for /3 > 1, we get 



H = H n < (n/e) 



1/3 
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Therefore, for any S > 0, 



n^oo II 1 



Applying this limiting relation to the analysis of the asymptotic behavior of 
the additive term T> n (p) in (12.231) one comes to the following result. 

Theorem 2.6. Suppose that the conditions of Theorem \2.1\ hold and S G 
£ 1 [0, 1]. Then, for any n > 1 and < p < 1/3, the estimate (12.171) with 
the weight coefficients f !2.26|) satisfies the oracle inequality (12.231) with the 
additive term T> n (p) obeying, for any 5 > 0, the following limiting relation 

lim ^M = 0. 



3 Proofs 

3.1 Proof of Theorem D 

Substituting (12~T3|) in (12~TT|) yields for any 7 £ T 

00 

Err n ( 7 ) = J n {l) +2$>(j)^ n + ||S|| 2 - pP n ( 7 ) , (3.1) 
i=i 

where 

/V _ n _ n'a _ ^ n p < }_7 , }_ < a ~ a n 

j,n °j,n °j u j,n y/n^ ,n n J,n n ■ ?,n n 

and the sequences (Sj n )j>x and (^j, n )j>i are defined in conditions C x ) and 
C 2 ). Denoting 

00 1 00 

^(7) = £ 7O') > M (7) = -F £ 7(J)%„ , (3-2) 

and taking into account the definition of the "true" penalty term in (I2.15p . 
we rewrite (13.1 ft as 

Err n ( 7 ) = Ul) + L( 7 ) + 2Af (7) + ^(7) 

+ 2 V / ^7T i?2 ' W ^ 7)) + ll^ll 2 - p?M (3.3) 
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where e{pf) = 7/I7I, the functions B 1 n and B 2 n are defined in (12.61) and (12. 7p . 

Let 7 = (7o(j))j>i be a fixed sequence in Y and 7 be as in (I2.16p . 
Substituting 7 and 7 in the equation (13.31) . we consider the difference 

Err„(7) - Err„( 7o ) = J(y) - J( % ) + 2^ L(x) + ^Jx) + 2M(x) 

"\/ (Jib \f (Jib 

- pP n (l) + pP n (l ) 

where x = 7 — 7 , e = e(7) and e = e(7 ). Note that by (12. 101) 

|L(x)|<|L(7)| + |L( 7 )|<2//. 

Therefore, by making use of the condition C x ) and taking into account that 
the cost function J attains its minimum at 7, one comes to the inequality 

Err n (7) - Err n ( 7o ) < 4^^- fi + + 2M(x) 

+ 2v^)^-pP n (7) 



+ pP n (lo) ~ 2^PM^0- . (3.4) 



Applying the elementary inequality 

2\ab\ < ea 2 + e^b 2 (3.5) 
with e = p implies the estimate 

5 2,n(e(7))l ^ „ , , , BlMl)) 



'an nap 
We recall that < p < 1. Therefore, from here and (13. 4p . it follows that 

2B* 2c* (n) 

Err n (7)<Err n (7o) + 2M(x) + ^^+ 1 ' 



nap n 

1 

n ' 



^-^l(l7| 2 + l7o| 2 + 4/x)+2pP n (7o 
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where B* n = sirp^p B 2 n (e(7)). In view of (I2.10p . one has 



sup|7| 2 < p. 
7er 



Thus, one gets 



2B* 2c* in) 
Err n (7) <Err n ( 7o )+2M(x) + ^^+ 1 



nap n 

+ ^\a n -a\ + 2pP n ( lQ ). (3.6) 



In view of Condition C 2 ), one has 



E ^* n < ^sBlMl)) < " c » ( 3 - 7 ) 

where z/ = card(r). 

Now we examine the first term in the right-hand side of (13.41) . Substi- 
tuting (12.41) in (13.21) and taking into account (11.31) . one obtains that for any 
non-random sequence x = (x(j))j >t with #(x) < oo 

1 00 1 
E 5 M 2 (x) < a* - £ x\3)B) = a* - \\S x f (3.8) 

i=i 

where S x = Y^ 1 x {j)@j < f > j- Let denote 

nM 2 (x) 



L = SU P IIQ 112 

xer 1 Uracil 

where r i = T — j . In view of (13.81) . this quantity can be estimated as 

^sZ-<^^f^<^,- = ^. (3.9) 

Further, by making use of the inequality (13.51) with e = p\\S x \\, one gets 

2\M(x)\<p\\S x \\ 2 + — . (3.10) 
np 
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Note that, for any x ET 1 . 



\\SJ 2 - \\S X \\ 2 = £ AjW) - *J,J < -1M x {x) (3.11) 

3=1 



where 



3=1 

Since |x(j)| < 1 for any x G I\, one gets 

E*M?(x) < . 

Denoting 

Z l = SU P ||C 112 ' 

xer-t \\°x\\ 

one has 

E 5 Z* < o*v . (3.12) 
By the same argument as in ( 13.101) . one derives 

2|M 1 (*)|<p||£J 2 + ^. 

np 

From here and (13.111) . one finds the upper bound for \\S X \\, i.e. 

1 - p np{\ - p) 



Using this bound in (13.101) gives 



1 - p np(l - p) ' 
Setting x = x in this inequality and taking into account that 
||4|| 2 = ||^ - SJ 2 < 2(Err„(7) + Err n ( 7o )) , 

we obtain 

2M( £) < 2p(Err„.(7) + Err w ( 7o )) + Z* + Z* 



1 - p np(l - p) ' 
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From here and (13. 6p . it follows that 

B* 



ErrJ^) < ^Err„( 7o ) + + C>) + 3^ - „|) 



np(l-3p) + l-3p j7oj ' 



Taking the expectation yields 



Using the upper bound for -P n ( 7o ) in Lemma |A. 11 one obtains 



where £>*(p) is defined in (12 .181) . 

Since this inequality holds for each 7 G T, this completes the proof of 
Theorem 12.11 □ 



3.2 Proof of Theorem I2T21 

We have to verify Conditions C x ) and C 2 ) for the process (11. 4p . 

Condition C-J holds with c*(n) = 0. This follows from Lemma [A. 21 if one 
puts f = g = (fij, j > 1. Now we check Condition C 2 ). By the Ito formula 
and Lemma \A.2\ one gets 

d/ t 2 (/) = 2/ t _(/)d/ t (/) + ^/ 2 (t)dt+^ Yl f 2 ( s )( Az s) 2 

0<s<t 

and 

Ei;(f) = a* f f(t)dt. 
Jo 

Therefore, putting 

hU) = i 2 t U) - E/ t 2 (/) , 
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we obtain 

dl t (f) = Ql fit) dm t + 2I t _(f)f(m t , W) = 

and 



E ( Az s) 2 - xt - ( 3 - 14 ) 

0<s<i 

Now we set 

oo 
3=1 

where x = (a; J -) 3 -> 1 with < n and |x| < 1. This process obeys the 

equation 

dl t (x) = q\ % dm t + 2( t _(x)d£ t , I (x) = , 

where 

^) = Y, X 3^) and C t (aO = £ x /t(^i(*)- 

i>i i>i 

Now we show that 

/■n 

E / 7 t _(x)d7 t (a:) = 0. (3.15) 



Indeed, note that 



/n rn 
i>i 



Therefore, Lemma [A.4I directly implies 



/n ^ rn 
1>1 - 70 

J>,E (Eij.(^))^(0dm t = 0. 



Moreover, we note that 



rn rn ^ 

/ V(^)C t -(^de t = E *i / V(0;Kt-(<^) dfc 



/>1 
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and 



o Jo 



From Lemma [A. 5[ it follows 

rn 

E / J t _(0.)V(^)^(*)de t = O 
and we come to (I3.15p . Furthermore, by the Ito formula one obtains 

rn rn +°° 

f(x) = 2 / 7 t _(x)d! t (x) + Ag\ / C, 2 (^)dt + ff J E (*) r fe 4 l {T } 
y o J ° k=l 

+ 4 ^ E i { T fc <» } + ^ E $ T fe (^)CT fc -(^) n 3 A {r fc <n} • 

k=l k=l 

By Lemma IA.3I one has Ei(( T ^_\T k ) = 0. Therefore, taking into account 
f)3.15p . we calculate 

rn 

E7'(x)=4^E / (?(x)dt + e A 2 EY*D l!n (x)+4 e 2 2 D 2jn (x), (3.16) 
Jo 

where 

oo oo 

D i,n&) = E^.^) 1 !^} and D 2n (x) = E^-^) 1 ^^} • 

fc=i fc=i 



By applying Lemma [A. 21 one has 

rn rn 

E/ C t 2 (*)dt = X>i*; / ^W^(*)E/ t (^)/ t (^)dt 



yE •'>'/(/ &(*W*)d*J <^y- (3-17) 
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Further it is easy to check that 

D hn = \£ & t (x)dt = \f^ ^^(t)j dt. 

Therefore, taking into account that < n and |x| < 1, we estimate D ln 
by applying the Causchy-Schwarts-Bounyakovskii inequality 

D l n < 4An - 4A ™#( X ) < AXn2 ■ ( 3 - 18 ) 

Finally, we write down the process ( t {x) as 

C t 0*0=/ QaCM)^ with Q 3 ,(t,s) = 5^x^.(5)^). 
J ° j>i 

By putting 

oo fe— 1 

5 2 ,n = E^i {Tfc < n} ^g^(r fc ,Tj 

fc=2 J=l 

and applying Lemma [A. 3 1 we obtain 

D 2,n = ffiE Q2 (Tfc)S)dsl ^ <ra} + 

k=l J ° 

Q%s)dsdt + g 2 2 D 2:n . 

., 

Moreover, one can rewrite the second term in the last equality as 

oo oo 

^2,n = J2' E1 {T l <n} QK T k> T l) 1 {T k <n} 

1=1 k=l+l 



/n / rn—s \ 
(j Ql(s + z,s)dzjds 

A 2 f Q t Q%s)ds^ dt. 
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Thus, 

D 2 , n < (XqI + \ 2 Q 2 2 ) ^ Q% s)ds^j dt 

= (\q* + \ 2 Q 2 2 )n 2 = \a*n 2 . (3.19) 

The equation (I3.16P and the inequalities (|3.17j) - (j3.18p imply the validity of 
condition C 2 ) for the process (11.41) . Hence Theorem 12.21 □ 



3.3 Proof of Proposition 12.4 

Substituting fl23D in fl2T20l) yields 



n _ n i n 
„■ l V 7 „- j 



3=1 3=1 

Further, denoting 

X 'j = 1 {l<j<n} aIld X 'j = —fc 1 {l<j<n} > 



-a 



we represent the last term in H3. 201) as 

1 r2 1 n / A 1 n / /A ^1 — Z + 1 

- 5 n = ~ S l,n(^) + -r= S 2,n * + < 

n hn n ' Jn ' n 

3=1 

where the functions B 1 n (-) and _B 2 „(•) are defined in conditions C-J and C 2 ). 
Combining these equations leads to the inequality 



By Lemma [A.6I and conditions C-J, C 2 ), one gets 



E s \a n -a\ < 



c*(n) c*(n) a 

+ + H 

n \/n \fn 
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In view of the inequality (11.31) . the last term can be estimated as 



3=1 

Hence Proposition 12.41 □ 



i 



a 



3=1 



4 Appendix 

A.l Property of the penalty term ( 12.151 ) 



Lemma A.l. Assume that the condition holds with o > 0. Then for 
any n > 1 and 7 G T, 



P n { 1 )<E s Err n { 1 ) + 



c*(n) 



n 



Proof. By the definition of Err n (7) one has 

00 / 1 \2 

Err n ( 7 ) = £ [ilU) ~ + M./)^ ; , f J • 

In view of the condition C-J this leads to the desired result 



i=i 



□ 



A. 2 Properties of the process (11.41) 

Lemma A. 2. Let f and g be any non-random functions from C 2 [0,n] and 
(I t {f))t>o be the process defined by (11.41) . Then, for any < t < n, 



EI t (f)I t (g) = a* [ t f(s)g(s)ds 
Jo 



where a* = + Xg^ 
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This Lemma is a direct consequence of Ito's formula as well as the fol- 
lowing result. 

Lemma A. 3. Let Q be a bounded [0, oo) x Q — > R function measurable with 
respect to B[0, +00) Q k , where 

Q k = cr{T 1; . . . , T k } with some k>2. (A.l) 

Then 

v(i Tk (Q)\g k )=o 

and 

J ° 1=1 
Now we will study stochastic cadlag processes r\ = ij] t ) Q<t<n of the form 

00 

Vt = Yl V ^ 1 {T l <t<T l+1 } , ( A -2) 
1=0 

where v (t) is a function measurable with respect to o-{w s , s < t} and the 
coefficient v^t), I > 1, is a function measurable with respect to 

Now we show the following result. 

Lemma A. 4. Let 77 = {rj t ) 0<t<n be a stochastic non-negative process given 
by f lA.21) . suc/i that 



E / 7] u dw < 00 . 
Jo 

Then 

E / r] u _dm u = 
Jo 

where the process m = (m t ) is defined in ( I3.14p . 



22 



Proof. Note that the stochastic integral, with respect to the martingale 
(I3.14p . can be written as 

0<u<n J ° 

= J2 r lT k -Y*l {Tk < n} -\ / r) u du. 
k=i J ° 

Therefore, taking into account the representation (IA.20 . we obtain 

Vu- dm u = T i - AT 2 ( A - 3 ) 



where 

+00 



T l = V k-l( T k~) Y k 1 {T k <n} and T 2 = / Vu 

k=l J ° 

Recalling that EK 2 = 1 and v k > 0, we calculate 

+00 



du . 



k=l 



Moreover, the functions {v k ) are cadlag processes, therefore the Lebesgue 
measure of the set {t e WL + : v k (t—) 7^ v k (t)} equals zero. Thus, 

This implies 

ET X = A E 1 { t,<„> / + u) e~ Xu du . (A.4) 



z=o 

Similarly we obtain 



i} dt 



ET 2 = ^El {T; < n} / ^(t) l {t < T;+l] 

= E E1 ffi<"l ^(T i + M )e- Au d M . (A.5) 



23 



Substituting OA .40 and flA.50 in (1A.3h implies the assertion of Lemma IA.4I 
□ 

Lemma A. 5. Assume that EYj 4 < oo. Then, for any measurable bounded 
non-random functions f and g, one has 



e / il(f)i t _(g)g(t)dt t = o. 

Jo 

Proof. First we note that 

il(f)i t -(g) g(t) dz t = Ej2 i 2 T M)hAg) g{T 3 ) i {T?<n} e Y i = o . 

Therefore, to prove this lemma one has to show that 

E / lf(f)l t (g)g(t)dw t =0. (A.6) 
Jo 

To this end we represent the stochastic integral I t (f) as 

I t {f)=Q x I™{f) + Q 2 P t {f), 

where 

I?(f)= f f s dw s and P t {f)= f f s dz s . 
Jo Jo 

Note that 

E|/^(/)| 4 < M 4 E Yj 4 E = M 4 EY 4 (An + AV) < oo , 

where 

M= sup (\f(t)\ + \g(t)\) . 

0<t<n 

Therefore, taking into account that the processes (w t ) and (z t ) are indepen- 
dent, we get 



i.e. 



rn 

E / lf(f)(I-(g)) 2 g(t)dt < oo, 

rn 

E / I?(f)I™(g)g(t)dw t = 0. 



24 



Similarly, we obtain 



E 



and 



E 



{I™{f)yP t {g)g{t)dw t = 



Therefore, to show (1A.6I) one has to check that 



E 



rn 

/ = 0. 



(A.7) 



where 



Taking into account that the processes (r) t ) and (w t ) are independent, we get 



E 



T] t dw t 



< E 



/ rf t dt < y/nE sup \rj f \ . 

Jn 0<t<n 



Here, the last term can be estimated as 



E sup \r} t \ < M 4 J2\ Y j\ - M^ElY^EN* < oo . 



0<t<n 



Hence the stochastic integral JJ 1 r] t dw t is an integrable random variable and 
E J ^dw t = EE^jT r/ t d«; t |r/ t ,0<t < nj = 0. 



Thus we obtain the equality (I A. 71) which implies (1A.6|) . Hence Lemma [A. 51 



□ 



A. 3 Property of the Fourier coefficients 



Lemma A. 6. Suppose that the function S in (II. ip is differentiable and sat- 
isfies the condition (12 .2 1 j) . Then the Fourier coefficients (12. 2p satisfy the 
inequality 



sup 

l>2 



3=1 
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Proof. In view of (12.11) . one has 



9 2p = ? =— I S(t) sm(27ipt)dt 



and 



Vi = ^/ S(t)(cos(27rpt)-l)dt 
/2 f 1 

— / S{t) sin 2 (irpt)dt, p>l 



From here, it follows that, for any j > 2 



Taking into account that 



l>2 



we arrive at the desired result. □ 



References 

[1] Akaike, H. (1974) A new look at the statistical model identification. 
IEEE Trans, on Automatic Control, 19, p. 716-723. 

[2] Barron, A., Birge, L. and Massart, P. (1999) Risk bounds for model 
selection via penalization. Probab. Theory Relat. Fields, 113, p. 301- 
415. 

[3] Cao, Y. and Golubev, Y. (2005) On oracle inequaliies related to a 
polynomial fitting. Mathematical Methods of Statistics, 14(4), p. 431- 
450. 

[4] Cavalier, L., Golubev, G.K., Picard, D. and Tsybakov, A. (2002) Oracle 
inequalities for inverse problems. The Annals of Statistics, 30, 843-874. 



26 



[5] Fourdrinier, D. and Pergamenshchikov, S. (2007) Improved selection 
model method for the regression with dependent noise. Annals of the 
Institute of Statistical Mathematics, 59 (3), p. 435-464. 

[6] Galtchouk, L. and Pergamenshchikov, S. (2004) Non-parametric se- 
quential estimation of the drift in diffusion processes. Mathematical 
Methods of Statistics, 13, 1, 25-49. 

[7] Galtchouk, L. and Pergamenshchikov, S. (2009) Sharp non-asymptotic 
oracle inequalities for non-parametric heteroscedastic regression mod- 
els. Journal of Non-parametric Statistics, 2009, 21, 1, p. 1-16 

[8] Galtchouk, L. and Pergamenshchikov, S. (2009) Adaptive asymptot- 
ically efficient estimation in heteroscedastic non-parametric regres- 
sion. Journal of Korean Statistical Society, |http: / fees. elsivier.com/jkssj 

[9] Galtchouk, L. and Pergamenshchikov, S. (2009) Adaptive asymptoti- 
cally efficient estimation in heteroscedastic non-parametric regression 
via model selection. \http://hal. archives-ouvertes.fr/hal-0 032691 /fr/\ 

[10] Galtchouk, L. and Pergamenshchikov, S. (2007) Adaptive sequential 
estimation for ergodic diffusion processes in quadratic metric. Part 

1. Sharp non-asymptotic oracle inequalities. - Prepublication 2007/06, 
IRMA, Universite Louis Pasteur de Strasbourg, 2007. 

[11] Galtchouk, L. and Pergamenshchikov, S. (2007) Adaptive sequential 
estimation for ergodic diffusion processes in quadratic metric. Part 

2. Asymptotic efficiency. - Prepublication 2007/07, IRMA, Universite 
Louis Pasteur de Strasbourg, 2007. 

[12] Goldfeld, S.M. and Quandt, R.E. (1972) Nonlinear Methods m Econo- 
metrics. North-Holland, London. 

[13] Kneip, A. (1994) Ordered linear smoothers. Annals of Statistcs, 22, p. 
835-866. 

[14] Konev, V.V. and Pergamenshchikov, S.M. (2008) General model selec- 
tion estimation of a periodic regression with a Gaussian noise. - Annals 
of the Institute of Statistical Mathematics, 2008, Available online at 
http://dx.doi. org/1 0. 1 007 '/si 0463-008-01 93-1 



27 



[15] Jacod, J. and Shiryaev, A.N. (1987) Limit theorems for stochastic pro- 
cesses. Vol.1, Springer, New York. 

[16] Mallows, C. (1973) Some comments on C . Technometrics, 15, p. 661- 
675. 

[17] Massart, P. (2004) A non-asymptotic theory for model selection. J^ECM 
Stockholm, p. 309-323. 

[18] Nussbaum, M. (1985) Spline smoothing in regression models and 
asymptotic efficiency in L 2 . Ann. Statist., 13, p. 984-997. 

[19] Pinsker, M.S. (1981) Optimal filtration of square integrable signals in 
gaussian white noise. Problems of Transimission information, 17 120- 
133 



28 



